Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS

نویسندگان

  • Feng Niu
  • Christopher Ré
  • AnHai Doan
  • Jude W. Shavlik
چکیده

Over the past few years, Markov Logic Networks (MLNs) have emerged as a powerful AI framework that combines statistical and logical reasoning. It has been applied to a wide range of data management problems, such as information extraction, ontology matching, and text mining, and has become a core technology underlying several major AI projects. Because of its growing popularity, MLNs are part of several research programs around the world. None of these implementations, however, scale to large MLN data sets. This lack of scalability is now a key bottleneck that prevents the widespread application of MLNs to real-world data management problems. In this paper we consider how to leverage RDBMSes to develop a solution to this problem. We consider Alchemy, the state-of-the-art MLN implementation currently in wide use. We first develop bTuffy, a system that implements Alchemy in an RDBMS. We show that bTuffy already scales to much larger datasets than Alchemy, but suffers from a sequential processing problem (inherent in Alchemy). We then propose cTuffy that makes better use of the RDBMS’s set-at-a-time processing ability. We show that this produces dramatic benefits: on all four benchmarks cTuffy dominates both Alchemy and bTuffy. Moreover, on the complex entity resolution benchmark cTuffy finds a solution in minutes, while Alchemy spends hours unsuccessfully. We summarize the lessons we learnt, on how we can design AI algorithms to take advantage of RDBMSes, and extend RDBMSes to work better for AI algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Incremental Grounding in Tuffy

Markov Logic Networks (MLN) have become a powerful framework in logical and statistical modeling. However, most of the current MLN implementations are in-memory, and cannot scale up to large data sets. Only recently, Tuffy addresses the scalability issue by using a pure RDB-based implementation. Inference in Tuffy could be divided into two stages: grounding and search. The grounding stage subst...

متن کامل

Scaling Inference for Markov Logic with a Task-Decomposition Approach

Motivated by applications in large-scale knowledge base construction, we study the problem of scaling upa sophisticated statistical inference framework called Markov Logic Networks (MLNs). Our approach, Felix,uses the idea of Lagrangian relaxation from mathematical programming to decompose a program into smallertasks while preserving the joint-inference property of the original ...

متن کامل

Felix: Scaling up Global Statistical Information Extraction Using an Operator-based Approach

To support the next generation of sophisticated information extraction (IE) applications, several researchers have proposed frameworks that integrate SQL-like languages with statistical reasoning. While these frameworks demonstrate impressive quality on small IE tasks, they currently do not scale to enterprise-sized tasks. To enable the next generation of IE, a promising approach is to improve ...

متن کامل

RockIt: Exploiting Parallelism and Symmetry for MAP Inference in Statistical Relational Models

ROCKIT is a maximum a-posteriori (MAP) query engine for statistical relational models. MAP inference in graphical models is an optimization problem which can be compiled to integer linear programs (ILPs). We describe several advances in translating MAP queries to ILP instances and present the novel meta-algorithm cutting plane aggregation (CPA). CPA exploits local context-specific symmetries an...

متن کامل

Evidence-Based Clustering for Scalable Inference in Markov Logic

Markov Logic is a powerful representation that unifies first-order logic and probabilistic graphical models. However, scaling-up inference in Markov Logic Networks (MLNs) is extremely challenging. Standard graphical model inference algorithms operate on the propositional Markov network obtained by grounding the MLN and do not scale well as the number of objects in the realworld domain increases...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2011